[NAS Backup] Suppress Errors in Disk Usage Calculation that Caused Backup to Fail.#13424
[NAS Backup] Suppress Errors in Disk Usage Calculation that Caused Backup to Fail.#13424daviftorres wants to merge 22 commits into
Conversation
Pulling upstream.
Handle potential errors when calculating disk usage.
|
This is the equivalent command for applying the fix: We haven't confirmed the exact root cause of the So, I am running tests with |
Add timeout for unmounting backup mount point and cleanup.
Proposed Changes Rationalebackup_size=$(du -sb "$dest" 2>/dev/null | cut -f1) || true
timeout 60 umount "$mount_point" 2>/dev/null || true
rmdir "$mount_point" 2>/dev/null || true
echo -n "$backup_size"
|
|
Dear @abh1sar , do you think you can help me with this bug? Regards, |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #13424 +/- ##
=========================================
Coverage 18.94% 18.94%
- Complexity 18363 18365 +2
=========================================
Files 6192 6192
Lines 556361 556361
Branches 67908 67908
=========================================
+ Hits 105397 105404 +7
+ Misses 439393 439381 -12
- Partials 11571 11576 +5
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR adjusts the KVM NAS backup script’s “statistics/cleanup” section so that failures while computing backup disk usage (and related cleanup commands) don’t cause an otherwise successful backup job to be marked as failed.
Changes:
- Capture
duoutput intobackup_sizeand suppressdustderr to avoid failing the script during size calculation. - Add
timeoutaroundumountand suppress errors fromumount/rmdir. - Emit the computed backup size at the end of
backup_running_vm().
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| umount $mount_point | ||
| rmdir $mount_point | ||
| backup_size=$(du -sb "$dest" 2>/dev/null | cut -f1) || true |
There was a problem hiding this comment.
Hi @daviftorres
Do we know the exact reason for du to fail? is that a connectivity issue?
This is the only place where the size is set on the backup. And a 0 size will affect resource limit and usage calculations.
May I suggest adding a retry loop instead?
| rmdir $mount_point | ||
| backup_size=$(du -sb "$dest" 2>/dev/null | cut -f1) || true | ||
|
|
||
| timeout 60 umount "$mount_point" 2>/dev/null || true |
There was a problem hiding this comment.
I am ok with this, but can we define 60 as a constant at the top of the file?
| timeout 60 umount "$mount_point" 2>/dev/null || true | ||
| rmdir "$mount_point" 2>/dev/null || true |
There was a problem hiding this comment.
Can we log the error (using log -e) instead of silently returning true here?
abh1sar
left a comment
There was a problem hiding this comment.
hi @daviftorres
would it be possible to test script changes in your env where it is reproducible?
Some log outputs would be nice.
Co-authored-by: Abhisar Sinha <63767682+abh1sar@users.noreply.github.com>
| backup_size=$(du -sb "$dest" 2>>$logFile | cut -f1) || { log -ne "WARNING: du failed for $dest, reporting size | ||
| as 0"; backup_size=0; } | ||
|
|
||
| timeout $UNMOUNT_TIMEOUT umount "$mount_point" 2>>$logFile || { log "WARNING: umount of $mount_point failed or timed out"; true; } | ||
| rmdir "$mount_point" 2>>$logFile || { log "WARNING: rmdir of $mount_point failed"; true; } |
Description
This PR tried to prevent the failure of the job at the statistics section of a backup that has actually succeeded.
Apparently, it also fixes some silent failures I previously reported in #11727
Types of changes
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
Bug Severity
Screenshots (if appropriate):
How Has This Been Tested?
How did you try to break this feature and the system with this change?